Tbl2KnownGene: A command-line program to convert NCBI.tbl to UCSC knownGene.txt data file

نویسنده

  • Yongsheng Bai
چکیده

UNLABELLED The schema for UCSC Known Genes (knownGene.txt) has been widely adopted for use in both standard and custom downstream analysis tools/scripts. For many popular model organisms (e.g. Arabidopsis), sequence and annotation data tables (including "knownGene.txt") have not yet been made available to the public. Therefore, it is of interest to describe Tbl2KnownGene, a .tbl file parser that can process the contents of a NCBI .tbl file and produce a UCSC Known Genes annotation feature table. The algorithm is tested with chromosome datasets from Arabidopsis genome (TAIR10). The Tbl2KnownGene parser finds utility for data with other organisms having similar .tbl annotations. AVAILABILITY Perl scripts and required input files are available on the web at http://thoth.indstate.edu/~ybai2/Tbl2KnownGene/ index.html.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SNPAAMapperT2K: A genome-wide SNP downstream analysis and annotation pipeline for species annotated with NCBI.tbl data files

UNLABELLED SNPAAMapper, a genome-wide SNP downstream analysis and annotation pipeline, was designed to classify detected variants according to genomic regions and report the mutation class by processing whole-genome and/or whole-exome sequencing data. A widely used sequence and data annotation table format "knownGene.txt" has not yet been created for many popular model organisms (e.g. Arabidops...

متن کامل

Image Operations using a Semi-compressed Contour Tree Image Definition

The contour tree file format has been used for a few years as a suitable storage format for most image types. The technique stores unique regions into a hierarchical data structure which defines the complete raster image. This data structure is called a contour tree. It compares very favourable with other lossless coding schemes on all image format types including, bi-level and reduced colour i...

متن کامل

PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs

UNLABELLED The analysis of genetic data often requires a combination of several approaches using different and sometimes incompatible programs. In order to facilitate data exchange and file conversions between population genetics programs, we introduce PGDSpider, a Java program that can read 27 different file formats and export data into 29, partially overlapping, other file formats. The PGDSpi...

متن کامل

Xtriage and Fest: automatic assessment of X-ray data and substructure structure factor estimation

Xtriage A command line utility that allows the user to rapidly assess the quality and specific idiosyncrasies of an X-ray dataset has been developed. The program, called Xtriage, combines the twin analyses tools as described in a previous CCP4 newsletter (Zwart, et al., 2005) with other data quality indicators. In the following sections, the various steps in the characterization of an X-ray dat...

متن کامل

rat: A Secure Archiving Program With Fast Retrieval

A new archive format called rat was developed. This format was designed to allow very fast retrieval of individual files. This is achieved using a table of contents to quickly find the file. Each file in the archive is individually compressed with a compression method specific to the file. A user created configuration file is used to specify what type of compression to use on each file based on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2014